P1. Posterior Probability Distribution
Task: Consider now that x is distributed as x~Ν(μ,16); we believe that the prior for the mean is μ~Ν(0,4). Use the distribution Ν(7,16) to generate observations for x.
- Develop an algorithm that estimates the posterior distribution’s mean and variance, assuming we have available N= 1, 5, 10, 20, 50, 100 and 1000 observations, respectively.
- For every N, provide a diagram that shows the prior distribution, the distribution generating the data, and the estimated posterior distribution.
Implementation:
The algorithm implemented is fairly straightforward, and utilizes the formulas for the mean and variance that were proven above for the posterior distribution.
## quartz_off_screen
## 2
Running the algorithm provides the following insightful graphs:
Posterior Distribution ~ N
P2. Polynomial Curve Fitting
Task: Draw a period of the sinusoidal function y(x)=sin(2πx) and select N samples for x uniformly distributed in the interval [0,1]. To every y(x) value add Gaussian noise distributed as Ν(0,1) to generate a set of observations.
- Fit to the noisy observations a polynomial (in the data) model of degree M=2,3,4,5 or 9 and provide a table with the coefficients of the best least-squares fit model and the achieved RMSE.
- Provide a plot showing the function y(x), the observations drawn, and the best fit model for every different value of M.
- Repeat the above procedure for N=10 and N=100.
Implementation:
The algorithm was implemented for 3 different values of N: 25, 10, and 100, and for models with degrees M=2,3,4,5 and 9. The coefficients of the polynomial model were estimated via the Least Squares method.
N = 25
Weights for N=25| Weights | M2 | M3 | M4 | M5 | M9 |
|---|---|---|---|---|---|
| w0 | 1.551613 | -1.13713 | -0.2993927 | -1.780131 | -1.499296 |
| w1 | -5.243373 | 25.37546 | 12.1534910 | 42.439316 | 34.864005 |
| w2 | 3.201364 | -66.77343 | -12.9503574 | -199.220992 | -132.150306 |
| w3 | NA | 43.32511 | -35.4599000 | 429.877872 | 174.062662 |
| w4 | NA | NA | 37.7025812 | -466.311479 | -14.101311 |
| w5 | NA | NA | NA | 196.851378 | -113.825112 |
| w6 | NA | NA | NA | NA | -40.200131 |
| w7 | NA | NA | NA | NA | 70.458280 |
| w8 | NA | NA | NA | NA | 86.706776 |
| w9 | NA | NA | NA | NA | -62.496632 |
N=25
N = 10
Weights for N=10| Weights | M2 | M3 | M4 | M5 | M9 |
|---|---|---|---|---|---|
| w0 | 0.342139 | -1.656795 | -2.436627 | -2.009874 | -3.134021 |
| w1 | 2.468431 | 22.022072 | 34.079037 | 26.614343 | 43.727136 |
| w2 | -4.080392 | -48.872120 | -97.669662 | -59.999460 | -118.178395 |
| w3 | NA | 28.343264 | 99.483133 | 20.552445 | 55.051207 |
| w4 | NA | NA | -34.032065 | 39.376230 | 82.728812 |
| w5 | NA | NA | NA | -25.132615 | 11.075251 |
| w6 | NA | NA | NA | NA | -56.311970 |
| w7 | NA | NA | NA | NA | -69.493005 |
| w8 | NA | NA | NA | NA | -21.053940 |
| w9 | NA | NA | NA | NA | 76.968799 |
N=10
N = 100
Weights for N=100| Weights | M2 | M3 | M4 | M5 | M9 |
|---|---|---|---|---|---|
| w0 | 1.4275230 | 0.098293 | -0.0150933 | -0.340045 | -0.249578 |
| w1 | -3.3739325 | 10.675388 | 12.4554136 | 20.179368 | 16.389805 |
| w2 | 0.9378136 | -31.876898 | -39.1585183 | -90.886316 | -53.898997 |
| w3 | NA | 20.985681 | 31.7178893 | 167.869434 | 38.019581 |
| w4 | NA | NA | -5.1755582 | -157.587446 | 22.747595 |
| w5 | NA | NA | NA | 60.809248 | -8.326979 |
| w6 | NA | NA | NA | NA | -20.004700 |
| w7 | NA | NA | NA | NA | -13.345086 |
| w8 | NA | NA | NA | NA | 1.858392 |
| w9 | NA | NA | NA | NA | 17.120006 |
N=100
P3. Predictive Bayesian
Task: For the same setup as in Problem 4 above, let’s assume that the observations are generated as t = y(x) + η, where y(x)=sin(2πx) and the Gaussian noise η is distributed by Ν(0, β-1) with β=11.1. You are given a dataset generated in this way with Ν=10 samples (x,t) where 0<x<1. Assume that you want to fit to the data a regression model of the form t = g(x,w) + η, where g(x,w) is an M=9 degree polynomial with coefficients vector w following a Normal prior distribution with precision α=0.005 (Bayes approach).
Construct the predictive model which allows for every unseen x (not in the training set) to produce a prediction t. Plot the mean m(x) and variance s2(x) of the predictive Gaussian model for many different values of x in the interval 0<x<1. What do you observe? Discuss your findings.
Implementation:
In order to fully incorporate the Bayesian approach for this task, the following formulas were taken into account:
- For the variance matrix S of the posterior distribution over the coefficients: \({{\bf\mathcal{S}}_N^{-1} = a{\bf I} + \beta{\bf\Phi}^T{\bf\Phi}}\)
- For the mean of the posterior distribution over the coefficients: \({{\bf{m_N} = \beta{\bf\mathcal{S}}_N}{\bf\Phi}^T{\bf t}}\)
- For the variance of the predictive distribution, for a point x from the test set: \({\sigma_N^2(x) = \frac{1}{\beta} + \phi(x)^T{\bf\mathcal{S}}_N \phi(x)}\)
- for the mean of the predictive distribution, for a point x from the test set: \({\mu = {\bf m}_N^T \phi(x)}\)
The four equations were incorporated in a single function, bayes.predict that computes the predictive distribution. The approach was tested for various numbers of observations, N=10, 50, 100, 200, 1000.